Search CORE

4 research outputs found

Architecture-Aware Optimization on a 1600-core Graphics Processor

Author: Daga Mayank
Feng Wu-chun
Scogland Thomas R.W.
Publication venue
Publication date: 01/07/2011
Field of study

The graphics processing unit (GPU) continues to make significant strides as an accelerator in commodity cluster computing for high-performance computing (HPC). For example, three of the top five fastest supercomputers in the world, as ranked by the TOP500, employ GPUs as accelerators. Despite this increasing interest in GPUs, however, optimizing the performance of a GPU-accelerated compute node requires deep technical knowledge of the underlying architecture. Although significant literature exists on how to optimize GPU performance on the more mature NVIDIA CUDA architecture, the converse is true for OpenCL on the AMD GPU. Consequently, we present and evaluate architecture-aware optimizations for the AMD GPU. The most prominent optimizations include (i) explicit use of registers, (ii) use of vector types, (iii) removal of branches, and (iv) use of image memory for global data. We demonstrate the efficacy of our AMD GPU optimizations by applying each optimization in isolation as well as in concert to a large-scale, molecular modeling application called GEM. Via these AMD-specific GPU optimizations, the AMD Radeon HD 5870 GPU delivers 65% better performance than with the wellknown NVIDIA-specific optimizations

Computer Science Technical Reports @Virginia Tech

CoreTSAR: Core Task-Size Adapting Runtime

Author: De Supinski Bronis R.
Feng Wu Chun
Rountree Barry
Scogland Thomas R.W.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/11/2015
Field of study

Queen's University Belfast Research Portal

CoreTSAR: Core Task-Size Adapting Runtime

Author: Barry Rountree
Bronis R. de Supinski
Thomas R.W. Scogland
Wu-chun Feng
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref

Extending OpenMP to Facilitate Loop Optimization

Author: Bertolacci Ian
Davis Eddie C.
de Supinski Bronis R.
Mills Strout Michelle
Olschanowsky Catherine
Scogland Thomas R.W.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2018
Field of study

OpenMP provides several mechanisms to specify parallel source-code transformations. Unfortunately, many compilers perform these transformations early in the translation process, often before performing traditional sequential optimizations, which can limit the effectiveness of those optimizations. Further, OpenMP semantics preclude performing those transformations in some cases prior to the parallel transformations, which can limit overall application performance. In this paper, we propose extensions to OpenMP that require the application of traditional sequential loop optimizations. These extensions can be specified to apply before, as well as after, other OpenMP loop transformations. We discuss limitations implied by existing OpenMP constructs as well as some previously proposed (parallel) extensions to OpenMP that could benefit from constructs that explicitly apply sequential loop optimizations. We present results that explore how these capabilities can lead to as much as a 20% improvement in parallel loop performance by applying common sequential loop optimizations

Crossref

Boise State University - ScholarWorks